Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.296
Filtrar
1.
PeerJ Comput Sci ; 10: e1917, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38660196

RESUMO

Heart disease is one of the primary causes of morbidity and death worldwide. Millions of people have had heart attacks every year, and only early-stage predictions can help to reduce the number. Researchers are working on designing and developing early-stage prediction systems using different advanced technologies, and machine learning (ML) is one of them. Almost all existing ML-based works consider the same dataset (intra-dataset) for the training and validation of their method. In particular, they do not consider inter-dataset performance checks, where different datasets are used in the training and testing phases. In inter-dataset setup, existing ML models show a poor performance named the inter-dataset discrepancy problem. This work focuses on mitigating the inter-dataset discrepancy problem by considering five available heart disease datasets and their combined form. All potential training and testing mode combinations are systematically executed to assess discrepancies before and after applying the proposed methods. Imbalance data handling using SMOTE-Tomek, feature selection using random forest (RF), and feature extraction using principle component analysis (PCA) with a long preprocessing pipeline are used to mitigate the inter-dataset discrepancy problem. The preprocessing pipeline builds on missing value handling using RF regression, log transformation, outlier removal, normalization, and data balancing that convert the datasets to more ML-centric. Support vector machine, K-nearest neighbors, decision tree, RF, eXtreme Gradient Boosting, Gaussian naive Bayes, logistic regression, and multilayer perceptron are used as classifiers. Experimental results show that feature selection and classification using RF produce better results than other combination strategies in both single- and inter-dataset setups. In certain configurations of individual datasets, RF demonstrates 100% accuracy and 96% accuracy during the feature selection phase in an inter-dataset setup, exhibiting commendable precision, recall, F1 score, specificity, and AUC score. The results indicate that an effective preprocessing technique has the potential to improve the performance of the ML model without necessitating the development of intricate prediction models. Addressing inter-dataset discrepancies introduces a novel research avenue, enabling the amalgamation of identical features from various datasets to construct a comprehensive global dataset within a specific domain.

2.
Heliyon ; 10(7): e28822, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38601671

RESUMO

Background: Physiological modelling often involves models described by large numbers of variables and significant volumes of clinical data. Mathematical interpretation of such models frequently necessitates analysing data points in high-dimensional spaces. Existing algorithms for analysing high-dimensional points either lose important dimensionality or do not describe the full position of points. Hence, there is a need for an algorithm which preserves this information. Methods: The most-distant uncovered point (MDUP) hypersphere method is a binary classification approach which defines a collection of equidistant N-dimensional points as the union of hyperspheres. The method iteratively generates hyperspheres at the most distant point in the interest region not yet contained within any hypersphere, until the entire region of interest is defined by the union of all generated hyperspheres. This method is tested on a 7-dimensional space with up to 35.8 million points representing feasible and infeasible spaces of model parameters for a clinically validated cardiovascular system model. Results: For different numbers of input points, the MDUP hypersphere method tends to generate large spheres away from the boundary of feasible and infeasible points, but generates the greatest number of relatively much smaller spheres around the boundary of the region of interest to fill this space. Runtime scales quadratically, in part because the current MDUP implementation is not parallelised. Conclusions: The MDUP hypersphere method can define points in a space of any dimension using only a collection of centre points and associated radii, making the results easily interpretable. It can identify large continuous regions, and in many cases capture the general structure of a region in only a relative few hyperspheres. The MDUP method also shows promise for initialising optimisation algorithm starting conditions within pre-defined feasible regions of model parameter spaces, which could improve model identifiability and the quality of optimisation results.

3.
Sensors (Basel) ; 24(7)2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38610302

RESUMO

With the rapid advancement of remote-sensing technology, the spectral information obtained from hyperspectral remote-sensing imagery has become increasingly rich, facilitating detailed spectral analysis of Earth's surface objects. However, the abundance of spectral information presents certain challenges for data processing, such as the "curse of dimensionality" leading to the "Hughes phenomenon", "strong correlation" due to high resolution, and "nonlinear characteristics" caused by varying surface reflectances. Consequently, dimensionality reduction of hyperspectral data emerges as a critical task. This paper begins by elucidating the principles and processes of hyperspectral image dimensionality reduction based on manifold theory and learning methods, in light of the nonlinear structures and features present in hyperspectral remote-sensing data, and formulates a dimensionality reduction process based on manifold learning. Subsequently, this study explores the capabilities of feature extraction and low-dimensional embedding for hyperspectral imagery using manifold learning approaches, including principal components analysis (PCA), multidimensional scaling (MDS), and linear discriminant analysis (LDA) for linear methods; and isometric mapping (Isomap), locally linear embedding (LLE), Laplacian eigenmaps (LE), Hessian locally linear embedding (HLLE), local tangent space alignment (LTSA), and maximum variance unfolding (MVU) for nonlinear methods, based on the Indian Pines hyperspectral dataset and Pavia University dataset. Furthermore, the paper investigates the optimal neighborhood computation time and overall algorithm runtime for feature extraction in hyperspectral imagery, varying by the choice of neighborhood k and intrinsic dimensionality d values across different manifold learning methods. Based on the outcomes of feature extraction, the study examines the classification experiments of various manifold learning methods, comparing and analyzing the variations in classification accuracy and Kappa coefficient with different selections of neighborhood k and intrinsic dimensionality d values. Building on this, the impact of selecting different bandwidths t for the Gaussian kernel in the LE method and different Lagrange multipliers λ for the MVU method on classification accuracy, given varying choices of neighborhood k and intrinsic dimensionality d, is explored. Through these experiments, the paper investigates the capability and effectiveness of different manifold learning methods in feature extraction and dimensionality reduction within hyperspectral imagery, as influenced by the selection of neighborhood k and intrinsic dimensionality d values, identifying the optimal neighborhood k and intrinsic dimensionality d value for each method. A comparison of classification accuracies reveals that the LTSA method yields superior classification results compared to other manifold learning approaches. The study demonstrates the advantages of manifold learning methods in processing hyperspectral image data, providing an experimental reference for subsequent research on hyperspectral image dimensionality reduction using manifold learning methods.

4.
Front Hum Neurosci ; 18: 1368115, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38590363

RESUMO

Introduction: Adaptation and learning have been observed to contribute to the acquisition of new motor skills and are used as strategies to cope with changing environments. However, it is hard to determine the relative contribution of each when executing goal directed motor tasks. This study explores the dynamics of neural activity during a center-out reaching task with continuous visual feedback under the influence of rotational perturbations. Methods: Results for a brain-computer interface (BCI) task performed by two non-human primate (NHP) subjects are compared to simulations from a reinforcement learning agent performing an analogous task. We characterized baseline activity and compared it to the activity after rotational perturbations of different magnitudes were introduced. We employed principal component analysis (PCA) to analyze the spiking activity driving the cursor in the NHP BCI task as well as the activation of the neural network of the reinforcement learning agent. Results and discussion: Our analyses reveal that both for the NHPs and the reinforcement learning agent, the task-relevant neural manifold is isomorphic with the task. However, for the NHPs the manifold is largely preserved for all rotational perturbations explored and adaptation of neural activity occurs within this manifold as rotations are compensated by reassignment of regions of the neural space in an angular pattern that cancels said rotations. In contrast, retraining the reinforcement learning agent to reach the targets after rotation results in substantial modifications of the underlying neural manifold. Our findings demonstrate that NHPs adapt their existing neural dynamic repertoire in a quantitatively precise manner to account for perturbations of different magnitudes and they do so in a way that obviates the need for extensive learning.

5.
Sensors (Basel) ; 24(7)2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38610294

RESUMO

The rapid development of the Internet of Things (IoT) has brought many conveniences to our daily life. However, it has also introduced various security risks that need to be addressed. The proliferation of IoT botnets is one of these risks. Most of researchers have had some success in IoT botnet detection using artificial intelligence (AI). However, they have not considered the impact of dynamic network data streams on the models in real-world environments. Over time, existing detection models struggle to cope with evolving botnets. To address this challenge, we propose an incremental learning approach based on Gradient Boosting Decision Trees (GBDT), called GBDT-IL, for detecting botnet traffic in IoT environments. It improves the robustness of the framework by adapting to dynamic IoT data using incremental learning. Additionally, it incorporates an enhanced Fisher Score feature selection algorithm, which enables the model to achieve a high accuracy even with a smaller set of optimal features, thereby reducing the system resources required for model training. To evaluate the effectiveness of our approach, we conducted experiments on the BoT-IoT, N-BaIoT, MedBIoT, and MQTTSet datasets. We compared our method with similar feature selection algorithms and existing concept drift detection algorithms. The experimental results demonstrated that our method achieved an average accuracy of 99.81% using only 25 features, outperforming similar feature selection algorithms. Furthermore, our method achieved an average accuracy of 96.88% in the presence of different types of drifting data, which is 2.98% higher than the best available concept drift detection algorithms, while maintaining a low average false positive rate of 3.02%.

6.
Cancers (Basel) ; 16(7)2024 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-38610998

RESUMO

Using multi-color flow cytometry analysis, we studied the immunophenotypical differences between leukemic cells from patients with AML/MDS and hematopoietic stem and progenitor cells (HSPCs) from patients in complete remission (CR) following their successful treatment. The panel of markers included CD34, CD38, CD45RA, CD123 as representatives for a hierarchical hematopoietic stem and progenitor cell (HSPC) classification as well as programmed death ligand 1 (PD-L1). Rather than restricting the evaluation on a 2- or 3-dimensional analysis, we applied a t-distributed stochastic neighbor embedding (t-SNE) approach to obtain deeper insight and segregation between leukemic cells and normal HPSCs. For that purpose, we created a t-SNE map, which resulted in the visualization of 27 cell clusters based on their similarity concerning the composition and intensity of antigen expression. Two of these clusters were "leukemia-related" containing a great proportion of CD34+/CD38- hematopoietic stem cells (HSCs) or CD34+ cells with a strong co-expression of CD45RA/CD123, respectively. CD34+ cells within the latter cluster were also highly positive for PD-L1 reflecting their immunosuppressive capacity. Beyond this proof of principle study, the inclusion of additional markers will be helpful to refine the differentiation between normal HSPCs and leukemic cells, particularly in the context of minimal disease detection and antigen-targeted therapeutic interventions. Furthermore, we suggest a protocol for the assignment of new cell ensembles in quantitative terms, via a numerical value, the Pearson coefficient, based on a similarity comparison of the t-SNE pattern with a reference.

7.
J Biomed Inform ; : 104641, 2024 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-38642627

RESUMO

OBJECTIVE: Clinical trials involve the collection of a wealth of data, comprising multiple diverse measurements performed at baseline and follow-up visits over the course of a trial. The most common primary analysis is restricted to a single, potentially composite endpoint at one time point. While such an analytical focus promotes simple and replicable conclusions, it does not necessarily fully capture the multi-faceted effects of a drug in a complex disease setting. Therefore, to complement existing approaches, we set out here to design a longitudinal multivariate analytical framework that accepts as input an entire clinical trial database, comprising all measurements, patients, and time points across multiple trials. METHODS: Our framework composes probabilistic principal component analysis with a longitudinal linear mixed effects model, thereby enabling clinical interpretation of multivariate results, while handling data missing at random, and incorporating covariates and covariance structure in a computationally efficient and principled way. RESULTS: We illustrate our approach by applying it to four phase III clinical trials of secukinumab in Psoriatic Arthritis (PsA) and Rheumatoid Arthritis (RA). We identify three clinically plausible latent factors that collectively explain 74.5% of empirical variation in the longitudinal patient database. We estimate longitudinal trajectories of these factors, thereby enabling joint characterisation of disease progression and drug effect. We perform benchmarking experiments demonstrating our method's competitive performance at estimating average treatment effects compared to existing statistical and machine learning methods, and showing that our modular approach leads to relatively computationally efficient model fitting. CONCLUSION: Our multivariate longitudinal framework has the potential to illuminate the properties of existing composite endpoint methods, and to enable the development of novel clinical endpoints that provide enhanced and complementary perspectives on treatment response.

8.
Genome Biol ; 25(1): 89, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38589921

RESUMO

Advancements in cytometry technologies have enabled quantification of up to 50 proteins across millions of cells at single cell resolution. Analysis of cytometry data routinely involves tasks such as data integration, clustering, and dimensionality reduction. While numerous tools exist, many require extensive run times when processing large cytometry data containing millions of cells. Existing solutions, such as random subsampling, are inadequate as they risk excluding rare cell subsets. To address this, we propose SuperCellCyto, an R package that builds on the SuperCell tool which groups highly similar cells into supercells. SuperCellCyto is available on GitHub ( https://github.com/phipsonlab/SuperCellCyto ) and Zenodo ( https://doi.org/10.5281/zenodo.10521294 ).


Assuntos
Pesquisa , Análise de Célula Única , Análise por Conglomerados , Software
9.
Biomed Environ Sci ; 37(2): 146-156, 2024 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-38582977

RESUMO

Objective: This study aimed to explore the association of single nucleotide polymorphisms (SNP) in the matrix metalloproteinase 2 (MMP-2) signaling pathway and the risk of vascular senescence (VS). Methods: In this cross-sectional study, between May and November 2022, peripheral venous blood of 151 VS patients (case group) and 233 volunteers (control group) were collected. Fourteen SNPs were identified in five genes encoding the components of the MMP-2 signaling pathway, assessed through carotid-femoral pulse wave velocity (cfPWV), and analyzed using multivariate logistic regression. The multigene influence on the risk of VS was assessed using multifactor dimensionality reduction (MDR) and generalized multifactor dimensionality regression (GMDR) modeling. Results: Within the multivariate logistic regression models, four SNPs were screened to have significant associations with VS: chemokine (C-C motif) ligand 2 (CCL2) rs4586, MMP2 rs14070, MMP2 rs7201, and MMP2 rs1053605. Carriers of the T/C genotype of MMP2 rs14070 had a 2.17-fold increased risk of developing VS compared with those of the C/C genotype, and those of the T/T genotype had a 19.375-fold increased risk. CCL2 rs4586 and MMP-2 rs14070 exhibited the most significant interactions. Conclusion: CCL2 rs4586, MMP-2 rs14070, MMP-2 rs7201, and MMP-2 rs1053605 polymorphisms were significantly associated with the risk of VS.


Assuntos
Metaloproteinase 2 da Matriz , Polimorfismo de Nucleotídeo Único , Humanos , Estudos de Casos e Controles , Estudos Transversais , Predisposição Genética para Doença , Genótipo , Metaloproteinase 2 da Matriz/genética , Metaloproteinase 2 da Matriz/metabolismo , Análise de Onda de Pulso , Transdução de Sinais
10.
J Comput Appl Math ; 4452024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-38464901

RESUMO

Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. However, NMF approaches suffer from the lack of multiscale analysis. This work introduces two persistent Laplacian regularized NMF methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By employing a total of 12 datasets, we demonstrate that the proposed TNMF and rTNMF significantly outperform all other NMF-based methods. We have also utilized TNMF and rTNMF for the visualization of popular Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE).

11.
Proc Natl Acad Sci U S A ; 121(10): e2319491121, 2024 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-38427601

RESUMO

Translocation of cytoplasmic molecules to the plasma membrane is commonplace in cell signaling. Membrane localization has been hypothesized to increase intermolecular association rates; however, it has also been argued that association should be faster in the cytosol because membrane diffusion is slow. Here, we directly compare an identical association reaction, the binding of complementary DNA strands, in solution and on supported membranes. The measured rate constants show that for a 10-µm-radius spherical cell, association is 22- to 33-fold faster at the membrane than in the cytoplasm. The kinetic advantage depends on cell size and is essentially negligible for typical ~1 µm prokaryotic cells. The rate enhancement is attributable to a combination of higher encounter rates in two dimensions and a higher reaction probability per encounter.


Assuntos
Transdução de Sinais , Citoplasma/metabolismo , Membrana Celular/metabolismo , Citosol/metabolismo , Membranas , Cinética
12.
Entropy (Basel) ; 26(3)2024 Feb 29.
Artigo em Inglês | MEDLINE | ID: mdl-38539730

RESUMO

Catchment classification plays an important role in many applications associated with water resources and environment. In recent years, several studies have applied the concepts of nonlinear dynamics and chaos for catchment classification, mainly using dimensionality measures. The present study explores prediction as a measure for catchment classification, through application of a nonlinear local approximation prediction method. The method uses the concept of phase-space reconstruction of a time series to represent the underlying system dynamics and identifies nearest neighbors in the phase space for system evolution and prediction. The prediction accuracy measures, as well as the optimum values of the parameters involved in the method (e.g., phase space or embedding dimension, number of neighbors), are used for classification. For implementation, the method is applied to daily streamflow data from 218 catchments in Australia, and predictions are made for different embedding dimensions and number of neighbors. The prediction results suggest that phase-space reconstruction using streamflow alone can provide good predictions. The results also indicate that better predictions are achieved for lower embedding dimensions and smaller numbers of neighbors, suggesting possible low dimensionality of the streamflow dynamics. The classification results based on prediction accuracy are found to be useful for identification of regions/stations with higher predictability, which has important implications for interpolation or extrapolation of streamflow data.

13.
Int J Mol Sci ; 25(6)2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38542456

RESUMO

This study investigates the roles of mucosal-associated invariant T (MAIT) cells and Vα7.2+/CD161- T cells in skin diseases, focusing on atopic dermatitis. MAIT cells, crucial for bridging innate and adaptive immunity, were analyzed alongside Vα7.2+/CD161- T cells in peripheral blood samples from 14 atopic dermatitis patients and 10 healthy controls. Flow cytometry and machine learning algorithms were employed for a comprehensive analysis. The results indicate a significant decrease in MAIT cells and CD69 subsets in atopic dermatitis, coupled with elevated CD38 and polyfunctional MAIT cells producing TNFα and Granzyme B (TNFα+/GzB+). Vα7.2+/CD161- T cells in atopic dermatitis exhibited a decrease in CD8 and IFNγ-producing subsets but an increase in CD38 activated and IL-22-producing subsets. These results highlight the distinctive features of MAIT cells and Vα7.2+/CD161- T cells and their different roles in the pathogenesis of atopic dermatitis and provide insights into their potential roles in immune-mediated skin diseases.


Assuntos
Dermatite Atópica , Células T Invariantes Associadas à Mucosa , Humanos , Citometria de Fluxo , Fator de Necrose Tumoral alfa , Voluntários Saudáveis
14.
Soc Sci Med ; 346: 116720, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38452490

RESUMO

BACKGROUND: Comprehensively measuring the outcomes of interventions and policy programmes impacting both health and broader areas of quality of life (QoL) is important for decision-making within and across sectors. Increasingly, broad QoL measures are being developed to capture outcomes beyond health-related quality of life (HRQoL). Jointly exploring the dimensionality of diverse instruments can improve our understanding about their evaluative space and how they conceptually build on each other. This study explored the measurement relationship between five broader QoL measures and the most widely used HRQoL measure, the EQ-5D. METHODS: Participants from the Dutch general population (n = 1002) completed six instruments (n = 126 items) in December of 2020. The measurement relationship was explored using qualitative and quantitative dimensionality assessment methods. This included a content analysis and exploratory factor analyses which were used to develop a confirmatory factor model of the broader QoL dimensions. Correlations between the identified dimensions and self-reported overall health and wellbeing were also explored. RESULTS: The final CFA model exhibited acceptable/good fit and described 12 QoL dimensions: 'psychological symptoms', 'social relations', 'physical functioning', 'emotional resilience', 'pain', 'cognition', 'financial needs', 'discrimination', 'outlook on life/growth', 'access to public services', 'living environment', and 'control over life'. All dimensions were positively correlated to self-reported health and wellbeing, but the magnitudes in associations varied considerably (e.g., 'pain' had the strongest correlation with overall health but a weak correlation with wellbeing). CONCLUSIONS: This study contributes to a broader understanding of QoL by exploring the dimensionality and relationships among various QoL measures. A number of the dimensions identified are HRQoL-focused, with others covering broader constructs. Our findings offer insights for the development of comprehensive instruments, or use of instrument suites that capture multidimensional aspects of QoL. Further research should explore the relevance and feasibility/appropriateness of measuring the identified dimensions in different settings and populations.


Assuntos
Emoções , Qualidade de Vida , Humanos , Qualidade de Vida/psicologia , Inquéritos e Questionários
15.
Neuron ; 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38452763

RESUMO

The brain's remarkable properties arise from the collective activity of millions of neurons. Widespread application of dimensionality reduction to multi-neuron recordings implies that neural dynamics can be approximated by low-dimensional "latent" signals reflecting neural computations. However, can such low-dimensional representations truly explain the vast range of brain activity, and if not, what is the appropriate resolution and scale of recording to capture them? Imaging neural activity at cellular resolution and near-simultaneously across the mouse cortex, we demonstrate an unbounded scaling of dimensionality with neuron number in populations up to 1 million neurons. Although half of the neural variance is contained within sixteen dimensions correlated with behavior, our discovered scaling of dimensionality corresponds to an ever-increasing number of neuronal ensembles without immediate behavioral or sensory correlates. The activity patterns underlying these higher dimensions are fine grained and cortex wide, highlighting that large-scale, cellular-resolution recording is required to uncover the full substrates of neuronal computations.

16.
Curr Biol ; 34(7): 1519-1531.e4, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38531360

RESUMO

How are we able to learn new behaviors without disrupting previously learned ones? To understand how the brain achieves this, we used a brain-computer interface (BCI) learning paradigm, which enables us to detect the presence of a memory of one behavior while performing another. We found that learning to use a new BCI map altered the neural activity that monkeys produced when they returned to using a familiar BCI map in a way that was specific to the learning experience. That is, learning left a "memory trace" in the primary motor cortex. This memory trace coexisted with proficient performance under the familiar map, primarily by altering neural activity in dimensions that did not impact behavior. Forming memory traces might be how the brain is able to provide for the joint learning of multiple behaviors without interference.


Assuntos
Interfaces Cérebro-Computador , Córtex Motor , Aprendizagem , Encéfalo , Mapeamento Encefálico , Eletroencefalografia
17.
Proc Natl Acad Sci U S A ; 121(12): e2317284121, 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38478692

RESUMO

Since its emergence in late 2019, SARS-CoV-2 has diversified into a large number of lineages and caused multiple waves of infection globally. Novel lineages have the potential to spread rapidly and internationally if they have higher intrinsic transmissibility and/or can evade host immune responses, as has been seen with the Alpha, Delta, and Omicron variants of concern. They can also cause increased mortality and morbidity if they have increased virulence, as was seen for Alpha and Delta. Phylogenetic methods provide the "gold standard" for representing the global diversity of SARS-CoV-2 and to identify newly emerging lineages. However, these methods are computationally expensive, struggle when datasets get too large, and require manual curation to designate new lineages. These challenges provide a motivation to develop complementary methods that can incorporate all of the genetic data available without down-sampling to extract meaningful information rapidly and with minimal curation. In this paper, we demonstrate the utility of using algorithmic approaches based on word-statistics to represent whole sequences, bringing speed, scalability, and interpretability to the construction of genetic topologies. While not serving as a substitute for current phylogenetic analyses, the proposed methods can be used as a complementary, and fully automatable, approach to identify and confirm new emerging variants.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , Filogenia , Aprendizado de Máquina
18.
J Big Data ; 11(1): 43, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38528850

RESUMO

Modern deep learning training procedures rely on model regularization techniques such as data augmentation methods, which generate training samples that increase the diversity of data and richness of label information. A popular recent method, mixup, uses convex combinations of pairs of original samples to generate new samples. However, as we show in our experiments, mixup  can produce undesirable synthetic samples, where the data is sampled off the manifold and can contain incorrect labels. We propose ζ-mixup, a generalization of mixup  with provably and demonstrably desirable properties that allows convex combinations of T≥2 samples, leading to more realistic and diverse outputs that incorporate information from T original samples by using a p-series interpolant. We show that, compared to mixup, ζ-mixup  better preserves the intrinsic dimensionality of the original datasets, which is a desirable property for training generalizable models. Furthermore, we show that our implementation of ζ-mixup  is faster than mixup, and extensive evaluation on controlled synthetic and 26 diverse real-world natural and medical image classification datasets shows that ζ-mixup  outperforms mixup, CutMix, and traditional data augmentation techniques. The code will be released at https://github.com/kakumarabhishek/zeta-mixup.

19.
Diagnostics (Basel) ; 14(6)2024 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-38535052

RESUMO

BACKGROUND: Identifying active lesions in magnetic resonance imaging (MRI) is crucial for the diagnosis and treatment planning of multiple sclerosis (MS). Active lesions on MRI are identified following the administration of Gadolinium-based contrast agents (GBCAs). However, recent studies have reported that repeated administration of GBCA results in the accumulation of Gd in tissues. In addition, GBCA administration increases health care costs. Thus, reducing or eliminating GBCA administration for active lesion detection is important for improved patient safety and reduced healthcare costs. Current state-of-the-art methods for identifying active lesions in brain MRI without GBCA administration utilize data-intensive deep learning methods. OBJECTIVE: To implement nonlinear dimensionality reduction (NLDR) methods, locally linear embedding (LLE) and isometric feature mapping (Isomap), which are less data-intensive, for automatically identifying active lesions on brain MRI in MS patients, without the administration of contrast agents. MATERIALS AND METHODS: Fluid-attenuated inversion recovery (FLAIR), T2-weighted, proton density-weighted, and pre- and post-contrast T1-weighted images were included in the multiparametric MRI dataset used in this study. Subtracted pre- and post-contrast T1-weighted images were labeled by experts as active lesions (ground truth). Unsupervised methods, LLE and Isomap, were used to reconstruct multiparametric brain MR images into a single embedded image. Active lesions were identified on the embedded images and compared with ground truth lesions. The performance of NLDR methods was evaluated by calculating the Dice similarity (DS) index between the observed and identified active lesions in embedded images. RESULTS: LLE and Isomap, were applied to 40 MS patients, achieving median DS scores of 0.74 ± 0.1 and 0.78 ± 0.09, respectively, outperforming current state-of-the-art methods. CONCLUSIONS: NLDR methods, Isomap and LLE, are viable options for the identification of active MS lesions on non-contrast images, and potentially could be used as a clinical decision tool.

20.
J Comput Chem ; 45(15): 1193-1214, 2024 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-38329198

RESUMO

This paper (i) explores the internal structure of two quantum mechanics datasets (QM7b, QM9), composed of several thousands of organic molecules and described in terms of electronic properties, and (ii) further explores an inverse design approach to molecular design consisting of using machine learning methods to approximate the atomic composition of molecules, using QM9 data. Understanding the structure and characteristics of this kind of data is important when predicting the atomic composition from physical-chemical properties in inverse molecular designs. Intrinsic dimension analysis, clustering, and outlier detection methods were used in the study. They revealed that for both datasets the intrinsic dimensionality is several times smaller than the descriptive dimensions. The QM7b data is composed of well-defined clusters related to atomic composition. The QM9 data consists of an outer region predominantly composed of outliers, and an inner, core region that concentrates clustered inliner objects. A significant relationship exists between the number of atoms in the molecule and its outlier/inliner nature. The spatial structure exhibits a relationship with molecular weight. Despite the structural differences between the two datasets, the predictability of variables of interest for inverse molecular design is high. This is exemplified by models estimating the number of atoms of the molecule from both the original properties and from lower dimensional embedding spaces. In the generative approach the input is given by a set of desired properties of the molecule and the output is an approximation of the atomic composition in terms of its constituent chemical elements. This could serve as the starting region for further search in the huge space determined by the set of possible chemical compounds. The quantum mechanic's dataset QM9 is used in the study, composed of 133,885 small organic molecules and 19 electronic properties. Different multi-target regression approaches were considered for predicting the atomic composition from the properties, including feature engineering techniques in an auto-machine learning framework. High-quality models were found that predict the atomic composition of the molecules from their electronic properties, as well as from a subset of only 52.6% size. Feature selection worked better than feature generation. The results validate the generative approach to inverse molecular design.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...